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laytiiu  and  Ron-Bayesien  Evidential  Updating* 

1.  lacaat  work  in  both  vision  systens  (Healey)  and  in 
knowledge  rapraaantatioo  (Lowrance,  Bamatt,  Quinlan,  Dillard) 
baa  employed  an  alternative,  oftan  rafarrad  to  aa  Daapatar/Shafar 
updating,  to  claaaical  Bayesian  updating  of  uneartain  knovladga. 
Varioua  other  investigators  hare  gona  beyond  claaaical  Bayaaian 
conditional!  cat  ion  (MTCIN,  EMYCIN,  DENDRAL,...)  but  in  a  laaa 
ayatanatic  aannar.  It  la  appropriata  to  exaaine  tha  fornal 
relations  between  various  Bayaaian  and  non-Bayeaian  approacbea  to 
what  baa  cone  to  be  called  evidence  theory,  in  order  to  explore 
the  queation  of  whether  tha  new  tachniquaa  are  really  aore 
powerful  than  the  old,  and  the  queation  of  whether,  if  they  are, 
this  increaant  of  power  is  bought  at  too  high  a  price. 


2.  Orthodox  probability  theory  auppoaea  (1)  that  we 

conaence  with  known  atatiatical  distributions,  (2)  that  these 
distributions  arc  such  aa  to  give  rise  to  real-valued 
probabilities,  and  (3)  that  these  probabilities  can  be  aodlfiad 
by  using  Bayes*  theorea  to  coadltionalize  on  evidence  that  is 
taken  to  be  certain.  There  are  thus  three  ways  to  aodify  the 
claaaical  theory. 

He  nay  dispense  with  the  supposition  that  we  are  dealing 
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with  known  atatiatical  distributions.  Tha  beat  known  advocate  of 
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|im  rise  to  lajtilia  ititlitlei,  baatd  on  that  fact  that  tha 
opinions  of  aost  paopla  ara  such  that,  facad  with  frequency  data, 
they  will  con var fa  raasonably  rapidly.  Furthermore,  in  practica, 
it  is  eoaaon  to  racogaisa  that  soaa  opinions  ara  battar  than 
othars,  and  to  usa  as  prior  distributions  in  statistical  iafaraaea 
distributions  raprasantlag  tha  opinions  of  kaovladgaabla  axparts. 
This  approach  has  baan  incorporatad  in  soaa  azpart  aystens,  for 
exaaple,  PROSPECTOR.  It  has  both  virtues  and  llaitatlons.  A 
puraly  pragaatic  virtue  is  that  it  allows  us  to  gat  on  with  our 
buslnass  naan  whan  wa  don't  have  tha  knowladga  of  prior 
distributions  we  would  Ilka  to  have.  It  has  tha  practical  virtue 
that  the  considered  opinions  of  genuinely  knowledgeable  axparts 
are  foracd  in  response  to,  and  reflect  with  soaa  degree  of 
accuracy,  relative  frequencies  in  nature.  But  it  has  two 
drawbacks:  it  does  not  incorporate  any  indication  of  whether  the 
opinion  is  a  wild  guesa,  or  a  considered  judgeaent  based  on  long 
experience;  and  it  calls  for  expert  opinions  even  in  the  face  of 
total,  acknowledged  ignorance. 

This  suggests  the  second  departure  fron  the  classical 
picture;  abandoning  tha  assuaption  that  our  probabilities  ara 
point -valued.  This  has  recently  bean  hailed  as  a  novel  departure 
(Lowranca,  1982,  p.  21;  Garvey,  at.  al.,  1981,  p.  319;  Dillard, 
1982,  p.  1;  Lowranca  and  Garvey,  1982,  p.  7;  Wesley  and  Hanson, 
1982,  p.  16;  Quinlan,  1982,  p.  9).  Tha  idea  of  representing 
probabilities  by  intervals  is  not  new  (cf.  Kyburg,  Good,  Levi, 
Snith),  and  tha  notion  of  probabilities  that  constitute  a  field 
richer  than  that  of  tha  real  ouabers  goes  back  even  further 


(Keynes,  1921,  pp.  30-40,  offers  a  foraal  pblloaopbleal  treat a«at 
of  ouch  entities;  B.  0.  Koopeen,  1941,  1942,  offers  e  aetbeastieal 
cberecter isst ion).  Bven  the  standard  subjectivistic  or 
persoaslist  view  of  probability  can  be  coastrued  ia  this  way; 
while  each  person  ban  a  set  of  real-valued  probabilities  defined 
over  a  given  field,  a  group  of  people  will  reflect  a  set  of 
probability  fuactlons  defined  over  the  field.  Ve  aay  quite 
reasonably  focus  our  attention  on  the  aaxlaue  and  aiaiaua  of  these 
functions  evaluated  at  a  aeaber  of  the  field. * 

In  general  the  representation  in  teres  of  intervals  seees 
superior  to  the  representation  in  teres  of  point  values.  Even  in 
the  ideal  ease,  in  which  all  of  our  eeasures  are  based  on 
statistical  inference  froa  suitably  aassive  quantities  of  data,  it 
is  eost  natural  to  construe  these  eeasures  as  being  constrained  by 
Intervals.  In  confidence  Interval  estieation,  for  exaaple,  what 
we  get  froa  our  statistics  is  a  high  confidence  that  a  given 
paraaeter  is  contained  in  a  certain  interval.  This  translates 
neatly  and  conveniently  into  an  interval  constraint.  The  results 
of  statistical  inference  should  reflect  indeterainacy  or 
vagueness.  Wbat  we  can  properly  dale  to  know  is  not  that  a 
paraaeter  has  certain  value,  but  (with  probability  or  high 
confidence)  that  it  lies  within  certain  Units.  This  liaitatlon 
of  huaan  knowledge  should  surely  be  alrrored  in  coeputer  based 
systeas. 

The  third  departure  froa  the  classical  sebeae  is  to  consider 
alternatives  to  Bayes'  theorea  as  a  way  of  updating  probabilities 
in  the  light  of  new  evidence.  This  departure  is  recent,  and  was 
first  stated  ia  Deapster,  1967.  Deapster's  novel  rule  of 


5 


coabinatlon,  subsequently  adopted  by  Sbefer  (1976),  is  often 
referred  to  ns  a  ‘’generalisation*  of  Bayesian  Inference  (Shafer, 
1981,  p.  337:  "The  theory  of  belief  functions  ...  is  a 
thoroughgoing  generalisation  of  the  Bayesian  theory  Lowrancc 

and  Garvey,  1982,  p.  9:  "Deapster's  rule  can  be  viewed  as  a  direct 
generalisat ion  of  Bayes'  rule  Dillard,  1982,  p.  1;  Garvey, 
et.  el.,  p.  319;  Lowrancc  1982,  p.  21).  This  suggests,  on  the  one 
hand,  that  Bayes*  rule  cen  be  regarded  as  a  special  or  Halting 
case  of  Deapster's  rule,  which  is  true,  and  on  the  other  hand  that 
Deapster's  rule  can  be  applied  where  Bayes'  rule  cannot,  which  is 
false.  Deapster  hiaself  recognises  (1967,  1968)  that  his  rule 
results  froa  the  laposition  of  additional  constraints  on  the 
Bayesian  analysis  (see  note  S). 

One  criticise  of  the  usual  Bayesian  approach  to  evidential 
updating  is  the  quantity  of  lnforaation  that  aay  be  required  to 
specify  the  probability  function  covering  the  field  of 
propositions  with  which  we  are  concerned.  This  aay  be  eapirical 
lnforaation  (if  the  underlying  probabilities  are  thought  of  as 
being  based  on  statistical  knowledge),  psychological  lnforaation 
(if  a  personallstic  interpretation  of  probability  is  adopted),  or 
logical  lnforaation  (if  we  interpret  probability  as  degree  of 
couflraatlon,  £  l£  Carnap  1951).  Suppose  we  consider  a  field  of 
propositions  based  on  the  logically  independent  propositions 
the  set  of  what  Carnap  called  "state  descriptions:" 
induced  by  this  basis  consists  of  2B.  atoas,  each  of  which  Is  the 
conjunction  of  the  _n  (negated  or  unnegated)  p«.  It  is  obvious 
that  for  reasonably  large  n  this  assignaent  of  probabilities 
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presents  great  diff 1  cult  las.  But  once  we  have  thoae  2iL  numbers, 
vt'rt  don*  -  we  can  calculate  all  ceadltiooal  probabilities  as 
vail  as  tbe  probability  of  any  proposition  in  the  field  based  on 
Pi • • • 

Is  there  a  earing  in  effort  if  we  go  to  a  Dempster/Shaf er 
System?  Using  tbe  handy  repraseatatlon  in  Shafer  (1976),  we  take 
0,  the  universal  set,  to  be  the  set  of  all  ZB.  possibilities 
represented  by  the  state  descriptions,  and  assign  a  aass  to  each 
subset  of  0.  This  requires  2  exp  2£.  assignments!  As  far  as  the 
nuaber  of  parameters  to  be  taken  account  of  is  concerned,  we  are 
exponentially  worse  off.  But  if  we  construe  probabilities  as 
intervals,  or  represent  then  by  sets  of  simple  probability 
functions,  we  are  just  as  badly  off.  (For  an  example  relating 
mass  assignments  to  interval  assignments,  see  table  I  in  the 
appendix.  For  the  general  equivalence,  see  theorem  1  below.) 
Dillard  (p.  4)  refers  to  "computational  limitations"  and  Lowrance 
and  Garvey  (1982)  mention  that  with  large  0,  maintaining  the  model 
is  "computationally  infeasible”. 

In  either  case,  we  need  to  find  some  systematic  and 
computationally  feasible  procedure  for  obtaining  the  masses  or 
probabilities  we  need.  Bayesian  and  non-*Bayesian  approaches  are 
in  essentially  the  same  difficult  situation  in  this  respect, 
although  there  are  often  plausible  ways  of  systematizing  tbe 
parameter  assignments  on  either  view. 

3.  Whether  the  representation  of  our  initial  knowledge, 
state  is  given  by  an  assignment  of  masses  to  subsets  of  9  or  by  a 
set  of  classical  probability  distributions  over  the  atoms  of  6,  it 
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is  important  that  these  masses  or  probabilities  ba  just  if  labia. 
As  already  suggested,  a  straightforward  way  of  obtaleleg  thee  is 
through  statistical  inference,  which  (when  possible)  yields 
interval  valued  estimates  of  relative  frequencies,  but  there  may 
also  be  other  ways  to  obtain  masses  or  Intervals  of  probability. 
If  so,  then  the  deep  and  difficult  problem  arises  of  how  to 
combine  both  statistical  and  eon-statistical  sources  of 
Information. 

It  has  been  suggested  that  Dempster /Shafer  updating  relieves 
us  of  the  necessity  of  making  assumptions  about  the  joint 
probabilities  of  the  objects  we  are  concerned  about.  Thus, 
Quinlan  claims  that  INFERNO  "makes  no  assumptions  whatever  about 
the  joint  probability  distributions  of  pieces  of  knowledge  ..." 
(Quinlan  1982).  Other  writers  have  made  similar  claims  —  e.g., 
Hesley  and  Hanson, 1982,  p.  IS.  (To  make  independence  assumptions 
is  exactly  to  make  assumptions  about  joint  probability 
distributions. ) 

It  is  clear  that  the  assignment  of  masses  to  subsets  of  6 
invloves  just  as  much  in  the  way  of  "assumptions"  as  the 
assignment  of  a  priori  probabilities  to  the  corresponding 
propositions.  In  view  of  the  redueibllity  of  the  Dempster/Shafer 
formalism  to  the  formalism  provided  by  convex  sets  of  classical 
probability  functions  (to  be  shown  below),  moreover,  we  may 
recapture  the  assumptions  about  joint  probability  distributions 
from  the  convex  Bayesian  representation. 


4. 


One  important  novelty  claimed  for  the  Dempster/Scbafer 
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systua  is  Its  ability  to  handle  uocsrtsin  evidence.  But  avail  this 
is  aot  in  itaalf  anti-Bayesian.  tbara  ara  also  Bayaslaa  aathods 
for  handlins  uneartala  evidence.  Ona  of  thaaa,  uaad  in  PROSPECTOR 
and  aentloned  by  Lowranca  (1982,  p.  17)  is  knovo  in  tha 
philosophical  world  as  Jaffray's  rula.  (It  is  prasaatad  and 
discussad  in  Jeffrey,  1965.)  It  follows  froa  Bayes*  thaoraa  that 

P(A)  -  P(A/B)P(B)  +  P(A/-B)P(-B) . 

If  you  adopt  a  naw  (coharant)  probability  function  _P',  tbara  ara 
assantially  no  constraints  on  P*(A).  But  ona  of  tan  confronts 
situations  whara  if  a  shift  in  probability  orlglnatas  in  tha 
assignaunt  of  a  naw  probability  to  that  should  not  affact  tha 
conditional  probability  of  A  given  B:  _P(A/B)  *  £’(A/B).  Ha  have 
laarnad  soaathlng  naw  about  B,  but  wa  haven't  laamad  anything  naw 
about  tha  bearing  of  tha  truth  of  B  on  tha  truth  of  A. 

Given  such  a  situation  the  response  of  a  shift  in  tha 
probability  of  B.  froa  P(B)  to  P'(B),  resulting  froa  naw  evidence, 
should  propagate  itself  according  to: 

P(A)  -  P(A/B )P ' (B )  +  I(A/-B)P'(-B) 

When  naw  evidence  leads  us  to  shift  our  credence  in  £  froa  P(B)  to 
P'(B),  a  corresponding  shift  in  probability  is  induced  for  every 
other  proposition  in  tha  field:  tha  new  probability  of  a 
proposition  A  is  tha  weighted  average  of  tha  probability  of  A, 
given  B,  and  tha  probability  of  A  given  not-B,  weighted  by  tha  naw 
probabilities  of  B  and  not-B. 


9 


Lowrance  (1982)  worries  about  th«  problea  of  iterating  this 
■ova.  Having  aada  it,  should  we  than  update  tba  probability  of  £ 
la  tha  light  of  tha  oav  probability  P'(A)?  Wesley  and  Baasoo 
(1982,  p.  15)  worry  about  a  potantial  "violation  of  Bayas’  law". 
But  what  is  offarad  is  not  a  relaxation  aathod;  it  is  a  aathod  of 
evaluating  tha  lapact  of  avidanca  which  warrants  a  shift  in  tha 
support  for  B.  It  aakes  no  sense  to  consider  updating  P*(B)  in 
the  light  of  the  new  value  of  P(A);  P'(B)  is  tha  source  of  the 
updating.  No  contradiction  lurks  hare.  But  there  ^s  a  difficulty 
for  aechanical  updating  -  tba  notion  of  a  source  is  dear  to  us, 
but  aay  not  even  be  raprasantad  in  an  artificial  systea. 

Other  Bayesian  updating  procedures  are  possible  (cf.  Hartry 
Field,  1978;  Dlaconls  and  Zabell,  1982),  but  it  is  hard  to  think 
of  one  so  siaple  and  often  so  natural.  This  is  particularly  true 
in  the  apisteaological  fraaework  considered  by  Shafer;  the  weights 
of  the  subsets  of  0  assigned  aasses  reflect  our  a  priori 
intuitions;  there  is  no  way  in  which  the  values  of  these  aasses, 
given  our  observations,  can  be  changed  without  changing  the  aodel 
entirely.  What  iapact  given  evidence  has  should  not  also  change 
according  to  the  evidence  we  happen  to  have.  Shafer  hiaself  has 
explored  the  relation  between  Jeffrey's  rule  and  his  own  updating 
recoaaendations  in  (Shafer  1981). 

5.  In  order  to  investigate  wore  closely  the  relations 
between  the  Bayesian  and  Deapster/Shafer  strategies  for  updating, 
it  will  be  helpful  to  have  several  foraal  results.  In  the  present 
section  we  establish  the  partial  equivalence  between  the 
assignaent  of  aasses  to  subsets  of  0  (the  space  of  possibilities) 
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•ad  the  assignaent  of  «  con  vox  sot  of  simple  classical  probability 
fuaetloaa  dafiaod  over  tba  a  toss  of  0.  Tba  equivalence  la  only 
partial,  tinea  aoaa  plaualbla  aituatlona  do  aot  hare  a  natural 
representation  in  teraa  of  aaaa  functions. 

Shafer's  belief  functions  are  defined  relative  to  a  fraae  of 
dlscernaent  0,  and  are  given  by  either  a  belief  function  or  a 
aass  function  defined  over  the  subsets  of  6.  The  stoat  of  0  are 
the  aost  specific  states  of  affairs  that  concern  us  in  a  given 
context.  The  belief  function  Bel  and  the  aass  function  a  are 
related  by: 

Bel(X)  -  I  a(A) 

AcZ  ~ 

Throughout,  "0“  is  to  be  understood  as  allowing  laproper 
inclusion.  Proofs  have  been  relegated  to  Appendix  A.) 

Our  first  observation  is  that  to  every  belief  function 
defined  over  a  fraae  of  dlscernaent,  there  corresponds  a  closed 
set  of  classical  probability  functions  defined  over  the  atoas 
of  6  such  that  for  any  X.c.8, 

Bel(X)  -  min  P(X). 

This  result  is  stated  as  Theorea  1  in  appendix  A,  and  proved 
there.  The  proof  gives  a  way  of  constructing  aeabers  of  the  set 
of  classical  probability  functions,  but  the  intuitive  idea  is 
siaply  this:  Consider  a  set  X,  to  which  is  assigoed  aass  a(X). 
That  aass  aay  be  construed  as  probability  aass  that  aay  be 
assigned  in  eny  way  (subject  to  other  constraints)  to  the  atoas 
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of  X.  Wo  obtain  tho  sot  of  eloooicol  probability  fuactioaa  that 
corresponds  to  tho  mass  functioo  a  by  considering  all  tho  vaya  io 
which  tho  aaas  that  la  not  assigned  to  atoas  hy  a  can  ho  aasigned 
to  atoas  while  aalotaining  tho  constraiats  iaposod  hy  tho 
assignaont  of  aass  to  sots  of  atoas.  Tables  I  and  II  in  tho 
appendix  show  both  tho  general  and  a  specific  coaputation  for  a 
siaplo  four-atoa  fraae  of  discernment. 

An  oxaaplo  that  shows  tho  converse  does  not  hold  is  tho 

following^:  Consider  a  coapound  exporiaont  consisting  of  either 

(1)  tossing  a  fair  coin  twice,  or  (2)  drawing  a  coin  froa  a  hag 

containing  40Z  two  headed  and  60Z  two  tailed  coins  and  tossing  it 

twice.  The  two  parts  (1)  and  (2)  are  perforaed  in  soae  unknown 

ratio  P,  so  that,  for  example  the  probability  that  the  first  toss 

lands  heads  is  £*1/2  +  (l-£)*0.4,  0  <£  <1.  Let  A  he  the  event 

that  the  first  toss,  lands  heads,  and  %  the  event  that  the  second 

1 088  lands  tails.  The  representation  by  a  convex  set  of 

probability  functions  is  straight-forward,  but  where  P*  is 
ain 


P*(AuB)  -  0.75  <0.9  -  IP* (A)  +  P*(B)  -  P*(AnB)  -  .4  +  .5  -  0 

By  theorem  2.1  of  Shafer  1976,  Bel(AuB)  >  Bel(A)  +  Bel(B)  -  Bel  (A 
AB),  P*  is  therefore  not  a  belief  function.  It  is  possible  to 
compute  a  mass  function,  but  the  masses  assigned  to  the  union  of 
any  three  atoas  must  be  negative. 

Subject  to  the  condition,  however,  that  P.*(AuB)  >  P*(A)  +  , 
P*(B)  -  P*(AoB),  we  can  represent  any  closed  convex  set  of 
classical  probability  functions  by  a  Shafer  mass  function.  This 
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1*  theorea  2  la  th«  appendix. 

These  two  theoreas  show  that  tba  rapraaaatatioa  of  uncertain 
fcnovladga  provided  by  Shafer's  probability  aass  functions  ia 
equivalent  to  a  representation  provided  by  a  convex  net  of 
daaalcal  probability  functions,  and  that  the  representation  of 
uncertain  knowledge  by  a  convex  set  of  claaaical  probability 
functions  is  equivalent  to  a  representation  provided  by  a 
probability  uaas  function  so  long  as  the  convex  set  of 
probability  functions  satisfies  the  general  relation  P*(AUB) 
P*(A)  +  P*(B)  -  P*(AnB). 

6.  Of  aore  interest  that  the  aere  representation  of  belief 

ia  the  possibility  of  representing  the  way  that  beliefs  should 
change  in  response  to  new  evidence.  Thus  what  we  propose  to  look 
at  in  this  section  la  the  relation  between  Deapster/Shaf er 
updating,  and  convex  Bayesian  updating.  We  shall  first  look  at 
the  relation  in  the  case  of  evidence  that  is  "certain**;  and  then 
we  shall  look  at  it  in  the  case  of  "uncertain  evidence". 

Suppose  that  our  beliefs  can  be  represented  either  by  a 
closed  convex  set  of  classical  probability  functions  S^,  or  by  a 
Shafer  aass  function.  Let  £  be  evidence  assigned  probability  1, 
or  support  1.  Shafer  defines  upper  and  lower  conditional  support 
functions  thus: 

Bel (A  [B)  -  (Bel(AuB)  -  Bel(B))/(l  -  Be 1(B)) 


P*(A]B)  -  P*(A^B)/P*(B), 
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where  P*(X)  -  1  -  Bel(T)  is  celled  the  plausibility  of  X. 

Theorem  3  In  the  appendix  shove  that  the  following 
inequalities  hold: 


■in  P(A|B)  <  Bel (A  IB)  <  P*(A|B)  < 


For  the  case  of  a  frame  of  discernment  with  four  atoms, 
illustrated  in  table  1  of  the  appendix,  we  have  the  following, 
where  X^  is  the  mass  assigned  to  the  set  i  in  0,  X^  ^  is  the  mass 
assigned  to  the  union  of  sets  i_  abd  j,  etc. 

Bln  P  (  A  [B  )  - — — 


Bel(AlB)  -  - - - - 

(E1'*,X3)+(^13+2t23^AH(Xl23+^l34+i234)+Xe+I-l2+‘l4+il2A1 

-l+(Jl2+Xl3+Xl4)+(^l23+5l24+il34)+^e 
P*(A|B)  -  - - - — - 

(J1+23  <  X  !  2^2 1 3*^1 4  -1 2  3*-!  24^  1 34)+ie+ ^23+-34+-2  34 1 


Xl+^X12+X13+X14^X123+X124+X134^+Xe 

max  P(A |B)  "  - — - - 

(X1+X3)+CX12+X13+X14)+CX123+xi24+xi34)+Xe 


We  observe  that: 


(1) 

min  P(A|B) 

-  Bel (A  |B  ) 

iff 

il2+il4+il24  “  0 

(2) 

Bel(A|B)  - 

P*(A|B) 

iff 

x13+x123?134+iB  *  0 

(3) 

max  P(A|B) 

•  **<*!*> 

iff 

X23+X34+X234  "  0 
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Before  turning  to  a  discussion  of  ths  lnsquslltlss  of 
theorea  3,  we  show  that  they  hold  la  general,  sod  srs  not 
rsstrlctsd  to  the  csss  of  "certain"  evidence.  Given  two  Isaacs, 
ths  proof  of  ths  ssnsral  result  (thsorsa  4  of  ths  appsadlz)  Is 
trivial.  Ths  two  Isaacs  thsasslvss  nay  not  ba  without  lntsrsst. 

7.  Ths  first  lsaas  (Isaac  1  of  ths  appendix)  states  that 

by  expanding  ths  fraas  of  dlscsrnasnt  6,  we  cao  represent  ths 
lapact  of  uncertain  evidence  as  ths  lapact  of  "certain"  evidence. 
This  is  not  to  say  that  we  need  to  specify  that  evidence;  It  is 
that  there  is  an  algoritha  by  aeans  of  which  the  lapact  of  the 
uncertain  evidence  can  be  represented  as  the  lapact  of  other 
"certain"  evidence. 

The  general  idea  of  the  erguaent  is  this.  Suppose  that  0  is 
the  fraae  of  discernaent,  and  that  our  initial  belief  function  is 
Bely.  The  lapact  of  uncertain  evidence  can  be  represented  by  a 
slaple  support  function  Belr  whose  single  focus  is  Ce2®,  to  which 
Belr  attributes  aass  _#  (and  therefore  aass  1-s  to  0).  To  give  a 
representation  by  "certain"  evidence,  we  split  every  atoa  of  6 
into  two  new  atoas  to  obtain  0*.  We  define  a  new  belief  function 
on  0',  Bely',  which  is  such  that 

(a)  if  Xc0,  BeTy’CX)  -  Bel,(X) 

(b)  if  X  c0,  (Bel,  t-  Belr)(X)  -  Bel,'(X  (E).  where  E  is  a 
subset  of  0'  such  thst  the  evidence  partially  supporting  C 
provides  total  support  for 

Two  renarks  on  this  construction  are  in  order.  First,  we 
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have  given  bo  rul«  for  fiadiat  the  "possibility"  E.  But  la 
gaaaral  that  should  be  ao  problaa.  Suppose  £  la  the  propositloa 
that  thara  la  a  squirrel  oa  tha  roof  of  tba  bara.  The  light  la 
had,  ao  Bair  assigaa  a  aass  of  only  .8  to  C,  aad  aaaigaa  the 
reaaining  aass  to  8.  We  take  E  la  8'  to  he  the  propositloa  that 
It  seeas  (.8)  to  be  the  case  that  there  is  a  squirrel  oa  the 
roof,  for  which  the  evidence  is  conclusive.  The  index  0.8 
iadicates  the  force  of  the  seeaiag,  aad  is  reflected  la  our 
asaigoaeat  of  masses  la  6'.  Ia  aany  situatloas  it  seeas  quite 
aatural  to  replace  "uncertain  evideace"  by  the  "certain"  data  on 
which  it  is  based. 

Even  the  case  discussed  by  Daiconis  aad  Zabell  (1982)  does 
not  seen  too  difficult.  The  case  is  oae  in  which  we  have  one 
degree  of  belief  that  a  Shakespearean  actor  to  be  heard  on  a 
record  is  Gielgud  (say  a  half),  but  after  hearing  his  voice  for  a 
while,  we  cone  to  have  a  degree  of  belief  of  .8  that  it  is 
Gielgud.  It  is  quite  true  that  we  would  be  bard  put  to  it  to 
describe  in  language  the  acoustic  characteristics  we  cone  to 
assign  to  that  voice  with  probability  1  that  in  turn  provide 
evidence  that  it  is  the  voice  of  Gielgud.  But  we  can  always 
refer  to  those  characteristics  as  "the  characteristics  I  have 
been  (consciously  or  unconsciously)  reacting  to". 

Second,  however,  whether  or  not  we  can  always  do  this  is 
ualaportaat  for  the  coaparison  of  Bayesian  and  Deapster 
conditioning.  We  caa  regard  the  introduction  of  £  to  be  aerely  a 
coaputatioaal  device  that  helps  us  to  coapare  the  distribution  of 
aasses  ia  8  according  to  the  function  Beli  ft  Belr  to  the 
corresponding  set  of  Bayesian  conditional  distributions. 
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Lemma  2  of  the  appendix  proto*  a  corresponding  face  about 
Jeffrey's  rula  for  unearcaio  evicence.*  It,  too,  may  ba 
rapraaantad  aa  tba  affact  of  (possibly  artificial)  "certain" 
evidence.  Tba  argument  la  alailar.  Suppoaa  our  original  dagraaa 
of  baliaf  are  dafiaad  over  a  certain  field  of  propoaltiona.  He 
introduce  a  new  elementary  atatement  into  that  field,  thereby 
dividing  each  atom  of  the  original  field  into  two  nav  atoms.  The 
new  elementary  statement  stands  for  that  statement  that,  if  it 
ware  "certain’',  would  have  just  the  affact  that  our  "uncertain" 
evidence  does.  He  than  show  that  the  resulting  new  probabilities 
obtained  by  conditlonalizing  on  our  new  statement  are  exactly 
those  yielded  by  applying  Jeffrey's  rule  to  the  shift  in 
probability  of  the  “uncertain"  evidence. 

Hlth  these  two  results,  and  our  previous  theorem  that  shows 
the  relation  of  Dempster/Shafer  and  convex  Bayes  updating  to  the 
case  of  "certain"  evidence,  it  follows  immediately  that  the 
inequalities  of  theorem  3  hold  whether  or  not  the  updating  is 
done  on  the  basis  of  "certain"  evidence.  In  any  case,  the 
intervals  resulting  from  Dempster/Shafer  updating  will  be 
subintervals,  and  may  be  proper  subintervals,  of  the  intervals 
resulting  from  the  application  of  coodltlonallzation  to  sets  of 
classical  probability  functions.^ 

8.  Dempster/Shafer  evidential  updating,  we  have  seen, 
leads  to  more  tightly  constrained  representations  of  rational 
belief  than  does  convex  Bayesian  updating.*  It  might  be  thought 
that  this  is  a  virtue.  But  whether  or  not  this  is  s  Good  Thing 


1*  open  Co  question. 

Suppose  thee  D  ■  Dj >...,Dn  ere  alternative  decisions  open  Co 
you,  end  Chet  you  have  e  utility  function  defined  over  the  cross 
product  of  D  end  the  set  8  of  possible  states.  You  begin  vith  a 
belief  function,  and  you  obtain  sons  evidence.  If  you  combine 
this  evidence  with  your  initial  belief  function  according  to 
convex  Bayesian  conditionalisation,  your  new  beliefs  vill  be 
characterised  by  a  set  of  probability  functions  Pg.  If  you 
perform  the  combination  of  evidence  according  to  non-Bayesian 
procedures,  your  new  beliefs  vill  be  characterised  by  a  set  of 
probability  functions  Pg  that  is  (in  general)  a  proper  subset  of 


Given  any  probability  function  P  in  either  Pj  or  Pg,  you  can 
calculate  the  expected  value  of  each  decision:  E(Dj ,  P).  Let  us 
say  that  DA  is  admissible  relative  to  a  set  of  probability 
functions  just  in  case  there  is  some  probability  function  in  the 
set  according  to  which  the  expected  value  of  Dj  is  at  least  as 

O 

great  as  the  expected  value  of  any  other  decision.  Since  Pg  is 
included  in  Pg,  the  admissible  decisions  we  obtain  if  we  update 
io  a  non-Bayesian  way  are  included  among  those  we  obtain  if  we 
update  in  a  Bayesian  way. 

There  are  three  cases  to  consider.  (1)  lie  obtain  the  same 
set  of  admissible  decisions  by  either  updating  procedure.  In 
this  case  we  have  gained  nothing  by  using  the  stronger  procedure. 
(2)  If  Pg  leads  to  a  set  of  admissible  decisions  containing  more 
than  one  member,  then  so  does  Pg,  and  we  must  in  either  case 
invoke  additional  constraints  io  order  to  generate  a  unique 
decision.  (3)  If  Pg  leads  to  a  unique  admissible  decision  and  Pg 
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doea  not,  we  appear  to  have  accoapliahed  soaething  useful  by 
aaaaa  of  noo-Bayeaiao  updating. 

But  It  la  open  to  qua  at  loo  whether  the  added  power  ahould  be 
built  iato  the  evidential  updating  rule,  or  whether  it  ahould 
appear  aa  part  of  a  deciaioo  procedure  that  takea  ua  beyond  the 
evidence.  Many  people  feel  that  principlea  of  evidence  and 
prlaciplea  of  decialon  ahould  be  kept  diatlnct. 

Conaider  an  urn  filled  with  black  and  white  iron  balla,  aoae 
of  which  are  aagnetized  and  aoae  of  which  are  not.  It  la  eaay  to 
iaagine  that  by  extensive  sampling,  or  by  word  of  the 
aanufacturer,  our  atatlatical  knowledge  about  the  eontenta  of  the 
urn  aay  be  aa  repreaented  in  table  II  of  the  appendix,  where  the 
aet  of  black  balla  la  repreaented  by  A,  and  the  aet  of  aagnetized 
bells  ia  repreaented  by  B.  Given  that  thia  la  our  initial  atate, 
we  aay  aak  what  our  attitude  ahould  be  toward  the  propoaitlon 
that  a  ball  aelected  froa  the  urn  la  aagnetic,  given  that  it  is 
white. 

Deapater  conditioning  yields  the  degenerate  interval  [0.8, 

0.83 . 

Bayeaian  condltionallzation  yielda  the  Interval  [0.5,  0.8}  . 
Suppoae  you  are  offered  a  ticket  for  $.7S  that  returna  a  dollar 
if  the  ball  la  aagnetic.  On  the  view  identified  with  Deapater 
and  Shafer,  it  la  not  only  peraiaaible,  but,  given  the  uaual 
utility  function,  aandatory  to  buy  it.  On  the  convex  Bayeaian 
view  either  accepting  or  rejecting  the  offer  would  be  adaiaalble. 
It  is  true  that,  for  all  you  know,  the  true  expectation  la 
positive;  but  it  la  also  true,  for  all  you  know,  the  true 
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expectation  It  negative.  If  everything  you  kaev  la  trua,  the 
expected  leaa  nay  at  111  ba  $->.25. 

On  tha  otbar  hand,  thara  ara  eaaaa  whara  Daapatar'a  rula  of 
combination  laads  to  intuitively  appealing  raaulta,  but  tha 

Q 

coorax  Bay  at  approach  does  not.  Suppoaa  you  know  that  70Z  of 
tha  soft  barrlat  in  a  cartaia  araa  ara  good  to  eat,  aad  that  60Z 
of  tha  rad  barrlaa  ara  good  to  aat.  What  ara  tha  chaaeaa  that  a 
aoft  rad  harry  la  good  to  aat?  Daapatar'a  rula  yialda  .42/. 54 
-  .78,  which  haa  Intuitive  appaal.  But  tha  aat  of  dlatrlbutloaa 
coapatlbla  with  tha  couditioaa  of  tha  problaa  aa  thay  have  boon 
atatad  loara a  tha  probability  of  a  aoft  rad  harry  balag  good  to 
aat  completely  undataraiaad:  it  la  tha  aatlra  iatanral  [0,l]  )  It 
la  poaalbla  that  100Z  of  tha  aoft  rad  barrlaa  ara  good,  and  it  la 
poaalbla  that  OZ  of  tha  aoft  rad  barrlaa  arc  good. 

It  la  dear  that  la  applying  the  rula  of  eoabinatioo,  wc  arc 
implicitly  conatraiaing  tha  aat  of  (joint)  diatributiona  wc 
regard  aa  poaalbla.  Thin  la  auggaatad  by  Shafer'a  requirement 
that  the  items  of  evidence  to  ba  combined  be  "distinct"  or 
"independent".  Tha  moat  natural  sufficient  condition  that  leads 
to  tha  same  result  aa  Dempster's  rula  of  combination  la  that  all 
tha  probability  functions  in  our  convex  sat  satisfy  tha  three 
conditions: 

(i)  P(G)  -  1/2 

(ii)  P(S/G*R)  -  P(S/G) 

(ill)  P(£/G*R)  -  P(S/G) 

Condition  (1),  of  course,  la  our  old  friend,  tha  principle  of 
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indifference.  Conditions  (11)  nod  (111)  represent  conditional 
ladnpondonca,  and  It  la  not  bard  to  iaaglne  that  we  ham  warrant 
for  supposing  they  ara  aatlafiad. 

Tha  axaet  nacaaaary  and  sufflclant  conditions  for  agraanant 
between  tha  two  nathoda  ara  that  our  aat  of  probability  functions 
satisfy  ona  of  tha  two  conditions 

(lw)  P(g«*S)/P(G«4S)  -  P (CM ) * P ( G4 S ) / P ( Gl, I ) * P ( C* S ) 
or  (v)  P(S/Btt)/t(S/B)  -  P(S)/P(G)  *  P(S/ai)/P(S/G) 

If  our  evidence  Is  statistical  in  character.  It  claarly 
behooves  us  to  unpack  tha  statistical  assuaptions  underlying  our 
aaployaant  of  non-Bayesian  updating  procedures.  But  what  if  our 
evidence  Is  not  statistical  in  character? 

Ona  plausible  response  is  that  Deapster's  rule  of 
coablnatlon  is  not  designed  for  all  cases  in  which  you  have 
statistical  data  to  serve  as  input.  Soaetlaes  the  aasses  In  the 
belief  function  are  deterained  by  frequencies,  and  soaetlaes  they 
are  not;  only  when  they  are  not  deterained  by  frequencies  should 
we  apply  non-Bayesian  updating.  It  is  difficult  to  wake  a  case 
against  this  response  except  by  asking  a  case  for  the  claia  that 
all  responsible  and  useful  probabilities,  even  very  vague  ones, 
are  based  on  statistical  knowledge.  But  even  granting  the  claia, 
we  aust  face  the  problea  of  how  to  treat  evidence  which  is  aixed 
—  which  contains  both  statistical  coaponents  and  intuitive 
coaponents.  While  it  is  a  theorea  that  Deapster  coablnatlon  is 
both  coaautative  and  associative,  it  is  obviously  not  the  case 
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that  a  mixture  of  Dempster  and  Bayesian  methods  need  b« 
commutative  aad  associative. 

It  should  ba  strongly  emphasized  Chat  Cha  peasant  arguaaots 
ara  not  lncandad  as  arguaaots  in  favor  of  tha  ganaral 
applicability  of  convex  Bayasian  condi t localisation.  Rathor, 
what  I  have  shown  is  (1)  that  tha  raprasantation  of  ballaf  statas 
by  distributions  of  aassas  over  subsats  of  a  sat  0  of 
possibilitias  is  a  spacial  ease  of  tha  convex  Bayasian 
raprasantation  in  terms  of  siapla  classical  probabilities  ovar 
tha  atoms  of  6,  (2)  that  tha  treatments  of  uncertain  evidence  in 
both  Bayasian  and  non-Baycsian  updating  arc  reducible  to  tha 
corresponding  treatments  of  certain  evidence,  and  (3)  that  non- 
Bayaslan  updating  yields  more  determinate  belief  statas  as 
outcomes,  but  that  the  benefits  afforded  by  oon-Bayeslan  updating 
are  limited  and  questionable. 
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Ammxx 

Theorem  _1 : 

Let  a  be  •  probability  mass  function  defined  over  •  frame  of 

discernment  ©.  Lat  1 el  ba  tba  corresponding  ballaf  function, 

Bal(X)  ■  a(A).  Than  tbara  is  a  cloned,  cootai  sat  of 

Ac). 

classical  probability  functions  Sp  da f load  over  tba  atoms  of  6 

such  that  for  every  subset  X  of  6,  Bal(X)  ■  min  P(X). 

~  ~  PeS 

“  “£ 

Proof:  Let  be  the  set  of  classical  probability  functions  P  defined 

on  the  atoms  of  6  such  that  for  every  X  c  6,  Bel(X)  P(X)  1 -Bel  Cl)  - 

is  closed,  since  PCX)  *  Bel(X) ,  F(X)  •  1-Eel (X)  is  a  classical 

probability  function.  is  convex,  since  for  0  <  a  <1,  aP^X)  +  (l-m)P2CX) 

lies  between  Bel(X)  and  l-Bel(X)  whenever  PjCX)  and  £2(X)  do*  Since  for  any  given 

there  is  a  Pcs  such  that  PCX)  -  BelCX),  BelCX)  >  min  P(X)  .  And  min  P(X) 

- £  - - - -  PtSp  f«Ip~  “ 

>  BelCX)  since  this  inequality  bolds  for  every  PcS^ . 

To  show  that  Sp  is  non-mmpty,  it  suffices  to  show  that  there 

is  a  PeSp  such  that  for  every  Xc8»  BelCX)  <  PCX),  since  if  this 

is  so,  then  Bal(X)  <  P(X)  and  1  -  BelCX) >  1  -  P(X)  •  PCX). 

Suppose  the  atoms  of  0  are  ordered  lexicographically.  For 

every  set  5t,  Xc.6,  add  the  mass  assigned  to  X,  m(X)  ,  to  the  mass 

assigned  to  {^},  where  (a^)  is  the  lexicographically  earliest  atom 

in  X.  Let  the  new  mass  function  be  m' .  Define  £00  *  J  m'  ((a)) . 

aeX 

P(0)*O;  £(©)«1,  since  all  the  original  mass  ends  up  on  the  atoms, 
and  £00  >  Bel(X),  since  the  mass  assigned  to  any  subset  of  X 
ends  up  on  the  atoms  of  X. 

□ 
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Theorem  2 

If  is  a  closed  convex  set  of  classical  probability  functions 
defined  over  the  atoms  of  8,  and  for  every  A,  Be6,  min  P(AvB)  >  min  P(A) 

+  min  P(B  -  min  P(AnB) ,  then  there  is  a  mass  function  m  defined  over 
the  subsets  of  8  such  that  for  every  X  in  8,  the  corresponding  Bel 
function  satisfies 

Bel  (X)  *  minP(X)  . 

PeS 

- £ 

Proof :  Since  is  closed  and  convex,  for  every  Xe8  there  is 

a  PCS  such  that  P(X)  =  min  p(X)  .  For  every  Xc8,  define  P*(X)  to  be  minpm 

PeS  PtS 

~“E 

By  Shafer's  Theorem  2.1,  if  0  is  a  frame  of  discernment 
function  Bel  2^  -*[0,l)  is  a  belief  function  if  and  only  if 


(1)  Bel  (0)  -  0  P*(0)  =  0 

(21  Bel  (8)  =  1  P*(6)  =  1 

(3)  For  every  positive  integer  n^  and  every  collection  , . 

of  subsets  of  8, 


*  »  A 


Bel  (A^u...uA  )  >  J  (-1) 

I  (1 , .  . .  ,n  } 


I  +1 


Bel  ( A  A,) 
itl  1 


Since  Shafer's  theorem  2.2  gives  an  algorithm  to  recapture  the  mass 
function  from  the  belief  function,  we  need  merely  establish  (3)  for 


our  function  P*. 


24 


O') 


i  I 

le(l. 


13>1 


Suppose,  on  the  contrary,  (3')  fails.  Then  there  is  a  specific  collection 
A,,..., A  ,  of  smallest  cardinality  n,  for  which  (3’)  is  false,  i.e.. 


|IK1 


p.u.o...t*)  <  J  (-D  p*(r\  aj 

^  led . ici-1 

But  P*(Aju. . .  cAp)  i£*(An)  ♦  la^u-  •  •  uA^)  -  £*(4^  u. . .  yA^ 


by  the  hypothesis  of  the  theorem.  Now 

and  by  hypothesis,  (3')  holds  for  collections  of  cardinality  of  (n-1).  Thus 


(4)  P*((AjnA  )  u  (A2OA  )  u  •  •  •  U  (A  fiA  ))  >  £  (-1) +1  p  (n  A. 

-  -  -  it{l,...,n-l)  iel"i 


and 

Let  us  compute 


(5)  JL*(A1u...uA  ,)  >  I  (-1)'-+1P*(AA.) 

*  S-1  UU . n-1 }  ,  tf- 


l  (-1)  m+1 n^) ; 

I^c(l ,  . .  .  ,n  } 

We  evaluate  the  sum  by  cases:  jl'  -  1,  jlj  >  1  *nd  n  i  l,  and  |lj  >  1 

and  nel^  in  each  case  writing  the  result  in  general  terms  for  ease  of  collection. 

(jl  *  1  :  ?*<A)  +  l  (-1)^+1  P  (O  Ad 

Ic{l . n-1)  *  i«I  1 

lil-i 

"  I  •  P*( A  )  +  1  P*(A.) 

iqi . n!  iEL  -  -s  i<£  - 

“|I|  -1 


A  ) 
- n 
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|l!  >  1.  15(1,..., n-l):  J  (-l)1^*1  P*(  A  a  ) 

ie{l . n-l)  icI  ^ 


|lj  >  1,  rc(l,...,n-l/:  I  -  l'u{n},  I'c{i . n-l} 


<-1)lrl+2  w 


I/<={1 . n-l: 


-j  <-i> 111+1  i.cCjV 


nel,  |1|>1 

Combining  the  three  terms,  ve  have, 

p*(a  )  >  j  (-D IH+1  p*(  n  a,)  +  l  (-i)lll+l  pt(  n  a.) 

-  ~  IC{1 . „}  icl“-  IC{1 . n)  ieri 

Til  -  1  £<i 

!l!>i 


+  I 

Icu,. 

ne£ 

1 1 1  >1 


(_i)LlI+1  P  (  n  a.) 

»n)  "  iel  “i 


I  (-D  U!+i  p  (  o  a.) 

r=a,...,n.*  - 


contradicting  our  assumption  that  there  was  an  n  for  which  (3') 
was  false. 
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Theorea  .3 : 

Let  8  be  a  fraae  of  dlscernaent,  Bel  e  belief  function,  end  Sp  the 
corresponding  set  of  Bayesian  probability  functions.  Let  B  be  evidence 
assigned  probability  1,  or  support  1,  and  suppose  P(B)  >  0  for  every 
£Sp.  Then  for  every  As8, 


Min  P(A  (B)  <  Bel(AjB)  <  P*(A|B)  <  a ax  P(A  IB) 

IZ^p  “  ~  I 


where  P^(A[B)  *  1  -  Bel(X|B)  is  Shafer's  plausibility  function 


Proof:  (All  maxiaa  and  Minina  are  taken  over  P^eSp . ) 


27 


max  P(A  |B) 
•  max  P ( AnB) .  Then 


Let  R  S  be  auch  that  R(AnB) 


R(AnB)  max  £(AnB) 

max  P(A|B)  > -  > -  -  P*(A|B) . 

R(B)  max  P (B) 


Lemma  £:  Let  6  be  a  frame  of  discernment .  Let  our  initial  belief 
function  be  Bel^.  We  obtain  new  evidence  whose  Impact  on  the  frame 
of  discernment  0  can  be  represented  by  a  simple  support  function 
(Shafer  1976,  p.  7)  Belr  whose  single  focus  is  £  2  •  Belc  attributes 
mass  £  to  £  and  mass  (l-£)  to  0. 

Let  the  foci  of  Bel^  -  the  subsets  A  of  8  receiving  mass  mjCA^O 

be  A,  ,  A .  A  .  We  can  construct  a  new  frame  of  discernment  0’  and 

— i  —2  — n 

a  new  belief  function  BelJ,  such  that 

(a)  For  every  Xc8,  Bel |(X)  *  Bel^(X) 

(b)  For  every  Xc8,  (Bel^f Bel^.) (X)  -  Belj(x|E),  where  EC26’,  and 

the  evidence  partially  supporting  C  provides  total  support  for 
£.  represents  the  application  of  Dempster's  rule  of  com¬ 

bination  to  Bel}  and  Belc;  Bel j (X|E)  represents  Dempster's 
rule  of  conditioning  on  £  -—the  analog  of  Bayesian  condi tion- 
alization  (Shafer  1976  p.  67). 
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Proof :  Lot  g.  be  new  to  8,  end  for  ovary  Lc®  generate  mo  mo  'poulbi> 
lities"  ge  end  ge.  Let  8’  •  {fc’ :  afcee(E.’*8*  v  !•«  E  •  {£ :  l^efK g/ 1 

Since  the  evidence  Chet  nupports  C  in  co  render  E  certain,  ne  here  £'c| 
i.e.  C*  •  Cg_* ;  ageC( g* •£• ) } . 

He  define  Bel^  *  on  Che  benin  of  '  es  follows: 

Bell’  hen  jn  foci  of  the  fore  A^t  eech  with  nans  (l-nJnjCj^),  where 
in  the  mans  function  nnnocieted  with  iel^ . 

For  every  i_  such  that  A^nP  •  #,  in  Co  be  a  focun  with  naan 

For  convenience  we  take  Che  flrnt  £  of  Che  to  be  choae  for 
which  A^nC'  •  9.  Vote  that  £  nay  be  0,  but  cannot  be  £,  cine  gel  • 
felg  would  be  undefined. 

The  remaining  £  give  rise  to  the  remaining  foci.  These  are  of  the 
fora  (AjnC'JuC^nL).  «nd  receive  the  remaining  aass.  Since  (A^c’X/ 
CAj  rfe)-(A^nC*)  u  (A^nF)  is  a  possibility  for  i^i,  we  write 


0}. 


B1,((AinC,)u(A1nI)) 


Z  W1 

i±:  <A^nC')u(  A^nf)  -  (^nC^u^f)} 


na 

Note  that  ' ((A.nC’)u(A./iD)  •  £  SiCA,)**,  since 

1  *  -  i-£H  ~ 


these  sets  have  positive  mass  only  if 
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Be  first  show  that  Bsl^  *  is  a  ballaf  f  once  loo.  Obviously  Its  mass 
function  ■  ’  Is  non-negative  for  every  so  n  oaod  only  show  that 

£  JR’CA)  “  1*  Sunning  over  the  three  kinds  of  foci,  we  have: 

A<=6' 

ct  p  n 

L.  S’ (A)  -  Z(l-i)».(£.)  +  (A. )  -  1, 

A<=e*  1  -  i-i  -  i-£fi  - 

We  next  show  that  Bel^ *  Is  equivalent  to  Bel^  -  i.e.  that  for  any 
Xc 0,  Bel1<  (X)  -  Eel1(X). 

(A^  +  T.  1*  +  ^■'((^ncyjCA^nl)) 

Ai r£cx  (AinC)  u( A Ji  E)cX 

The  first  ter*  yields  £  (l-s)®.  (A. )  -  (I-5)  £  B,  ( A< ) 

vl  v1 

Since  X  "  (XrE)u (Xn  E) ,  A^n  E  c  XnE  if  and  only  if  A_c  X ,  in  view  of  the 

fact  that  pe  e  A^n^E  if  and  only  if  j£  ,  and  the  same  holds  for  X.  Thus 

the  second  tern  yields  ]T  s*n.,(A.) 

LfL  ~ 

l5i 5£ 


Bfl,  ’  (X )  -  I -’(A)  -  £  i 

AcX  J^c* 
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To  evaluate  the  third  term,  we  claim  that  (A^CX  (A^j^)  e  j  if  aiwj  oaly 
lf  If  AjCX,  then  A^C  c  X  and  Ajft?  cl  usd  so  (AjfiSJ'XA^Ijc  x. 

Suppose  (A^n  CXj  { A^n  E)  c  X.  Then  A^E  cX,  A^?  c  *iE ,  and  by  the  prece¬ 


ding  argument  AcX.  Thus  the  third  tern  yields  £  Si  (  A  - ) 

A.=  X  1  - 


]><  i<0_ 


Putting  the  three  parts  together,  we  have  Bel^fjp  »  fel^X). 

He  now  show  that  conditioning  on  E  in  the  fraae  of  discernment  6  * 
is  equivalent  to  combining  uncertain  evidence  C.  with  Bel^  in  the  frame 
of  discernment  6  according  to  Dempster's  rule  of  combination: 

For  every  fee,  ( Bel15BeI(.X30“  Bel^'CXj  El 


(1) 


AjO^O,  AjHfcX 


+ 


(AjKl-s) 


1 


"  I  VV*1 

J^C"t 


(The  numerator  comprises  two  sums,  since  Bel  has  two  foci:  C  and  6 
with  masses  and  (l-s)  respectively.) 


(2)  Bel^  *(X|  E) 


Ial’(A)"  LafCA) 

AcXuf  Acr 


1  -  £*,’<&> 

*=£ 
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*/<A>  •  L  f±> (A. > •  only  the  foci  of  the  fora  a  nE 

^-1  1  *  *Tt  ” 

ere  included  in  I:  ^  -  <^n  £)  u(  A^r£)  is  not  included  In  end 
elnce  £*£,  (^rfc*  )u(^1n£)  is  Included  in  |  only  if  k^C^H,  in 
which  cese  it  hes  no  ness. 

P 

t.  »!  u,  >  *  *  -  r 

1  ^  i-1  ^  i 


Bence  the  denominators  of  (1)  end  (2)  ere  the  eene. 

It  reueins  to  evaluate  £_n.'(A).  Consider  foci  of  the  fora 
_  A«=XflE 

_A,  -  AcXuE_  if  end  only  if  A^X.  so  these  foci  yield  nsss 


Z  o’ca.)  -  I  u-^ca.) 

“  AtcX  i 


correspooding  to  the  right  hand  tern  in  the  nunerator  of  (1). 


Consider  foci  of  the  fora  A  nE.  All  of  these  are  Included  in 

“i  ~ 

XuE;  they  yield 


i»l 


«1<V 


£_V(A), 

AcE 


•o  they  drop  out  of  the  nunerator  of  (2). 
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Finally,  consider  foci  of  ehe  fora  (A  /iC'  )u  (Anfj .  We  first  shov 

that  (A^nC'  )u(Ain?)  =  XuE  if  and  only  if  A^nC'  c  x.  Suppose 

(A  fiC')u(A  i)  c  XjuE.  Then  A  nC*  c  Rut  C^E,  so  A«C  - 

”1  ~  ~l  —  —  —  “X  —  — x  — 

k.iC’nl  c  XuE  only  if  A.nC*  c  x.  Suppose  A  nC*  «  X.  Then  since 

“1  ~  ~  —  ~  “i  “  1  “  *• 

A^J  c  f  c  XjE,  (A^C*  Ju^/iE)  c  XuE. 

Ve  conpute  the  ness  in  the  numerator  of  (2)  due  to  foci  of 
this  sort.  They  have  mass  only  when  ^nc'^O.  And  then  they  have 

■ass 

I  i 

(i:  (A^C'JutA^nE)  -  (A^C’JuCA^f)} 


each  A,  such  that  A  nC*  c  X  contributes  s»u.(A,).  Their  total  aass 

—1  — 1  *  "  **A. 

is  therefore 


r  > , 

A./IC*  c  X  - 

corresponding  to  the  first  tern  of  the  nuaerator  of  (l). 


We  have  therefore  shown  that  (Bftlj#Ral^)(X)»&el^  *(xjE) . 

□ 
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Lean*  2 

Suppose  that  Pq  is  an  assignment  of  probabilities  to  the  field 
of  propositions  whose  basis  is  a^,  a^ ,  Let  P^  be  generated 

by  a  shift  in  the  probability  assigned  to  A  ;  this  shift  is  the 
source  of  our  new  probability  P^.  By  Jeffrey's  rule,  for  all  X  , 

!l<i>  “  +  *o<*|±>**i<±>- 

Then  there  exists  a  new  field  of  propositions  f* ,  and  a  proposition 
E,  and  a  new  probability  function  Pq  defined  on  P  such  that  for  every 
proposition  X  in  the  old  field  F* 

(a)  P£(X)  -  PqCX) 

(b)  P£(X|E)  -  PX(X) 


Proof ; 

Add  a  new  atonic  proposition  e_  to  the  basis  of  F  to  obtain 
the  field  £*,  and  represent  it  by  E.  He  impose  the  constraint  P^  (^E) 
"  (®)  »*y  have  any  value  that  strikes  our  fancy. 

He  extend  ^  »o  that  for  any  IfL  <l)  -  Iq^X);  _PJ  i,  fttuy 
equivalent  to  P^,  so  far  as  is  concerned,  before  we  obtain  Information 
about  A.  Specifically,  set 


£.(  A) 

k  -  -* -  ;  kf 

*0<A) 

For  UL\  Mt 


1-yX) 

l-Eo<A) 


HlFqC  6) 


i^(xAD  •  q  (v  •  ni^d^A)  ♦  z^i^Di 
<X*X)  -  ^(X)  -  (X*£) 
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Clearly,  for  X,e£, 

(X)  -  <**£)  ♦  2i  <1A£>  "  ZqCD 

Uc  bow  show  choc  for  XtF,  probobllicios  conditional  on  l  art  aqoal 
to  the  probabllltlea  given  by  Jeffrey's  rule:  S^iX)  -  Z^  (lit)* 


For  XtE,  Z^  (X|X)  - 

li  <£> 


*J  a> 

li  <i> 


fo<A) 


^(A) 


^00 


D 

Theorem  4_ 

Let  a  distribution  of  beliefs  be  given  both  by  the  function 

Belj  and  by  the  prior  set  of  probability  distributions  Sp.  Suppose 

new  evidence  is  obtained  whose  Impact  is  given  by  a  simple  support 

function  Bel^  assigning  positive  mass  to  A  and  6,  or,  alternatively, 

by  a  shift  in  the  probability  of  A  on  each  of  the  distributions 

in  Sp  ;  let  Sp  be  the  result  of  propagating  this  shift  by  Jeffrey’s 
— 1  -2 

rule,  and  let  Bel^  be  the  result  of  applying  Dempster's  rule  of  combination. 
Then 


min  P(X)  <  Bel-(X)  <  l-Bel„(X)  <  max  P(X) 


-2 


-2 


for  all  subsets  X  of  8. 


Proof: 


Immediate  from  1 


2,  and  Theorem  3 


Table  1 


© 

© 

© 

© 


Mass 

Lover  Measure 

Upper  Measure 

u 

*1 

*1 

l*X2*X3'X4_X23"X24'X34’X234 

AB 

*2 

X2 

1'Xl'X3"X4~X13~X14'X34~  *134 

AB 

*3 

1‘Xl'X2‘X4‘X12_X14‘X24“X124 

AB 

X4 

X4 

l~Xl'X2‘X3’X12*X13"X23“X123 

0  u<|) 

^2 

*1+V*12 

L-X--X. -X 

3  4  34 

©  u@ 

*13 

Xl+X3+X13 

1_X2_X4'X24 

©  u@ 

*14 

VX4+X14 

1*X2'X3'S23 

®  U© 

*23 

X2+X3+X23 

1-XI“X4“X14 

(D  u@ 

*24 

W*24 

1“X3'X1'X13 

©  u© 

^4 

X3+X4+X34 

1*X1’X2"X12 

©  u(D’u 

© 

X123 

Xl+X2+X3+Xi2+X13+X23+X12  3 

1“X4 

©  u©  u 

© 

X124 

X1+X2^X4+X12+X14+X24+X124 

i-x3 

©  U  ©  u 

© 

X134 

VX3+X4+X13+X!4+X34+X134 

l‘X2 

©  U  ©  U 

© 

^34 

X2+X3+X4+X23+X24+X34+X234 

e 

X9 

1 

1 

*9  *  l-“i 


Id  Id  6  IB 
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Tabl*  II 

A:  whit*  B:  magnetic 


Set 

mass  Bel 

Frequency 

X 

0.2 

[0.2, 0.4] 

1 

*2 

0.2 

[0. 2,0.4] 

*, 

0.1 

[0. 1,0.1] 

*4 

0.2 

[0.2, 0.4] 

*12 

0.1 

[o.  5,0.7] 

*13 

0.0 

[0. 3.0.5] 

*14 

0.1 

[0.  5,0.7] 

*23 

0.0 

10.3,0.5] 

*24 

0.1 

[0.  5,0.7] 

*34 

0.0 

[0.3.0.5] 

*123 

0.0 

[0.6, 0.8] 

*124 

0.0 

[0. 1,0.9] 

*134 

0.0 

[0.6, 0.8] 

*234 

0.0 

[0.6, 0.8] 

e 

0.0 

[1.0, 1.0] 
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VOTKS 

*  Research  for  this  paper  «tt  supported  in  psrt  by  the  D.S.  Army 
Signals  Warfare  Laboratory,  and  was  stimulated  in  large  part  by 
conversations  witb  Jerry  Feldaan  and  Ron  Loui  of  the  Department 
of  Computer  Science  at  the  University  of  Rochester.  Judea  Pearl 
carried  out  bis  duties  as  referee  with  exemplary  efficiency  end 
offered  much  good  advice.  I  hope  I  have  succeeded  in  following 
it.  An  anonymous  referee  pointed  out  an  error  in  the  original 
proof  of  theorem  3,  and  provided  a  suggestion  to  correct  it. 

1.  This  approach  is  similar  to  that  of  Smith  (1961).  It  is  also 
similar  to  the  approach  of  Levi  (1974,  1981),  Good  (1962),  and 
Kyburg  (1974),  but  as  Levi  points  out  in  (1981)  there  are 
important  differences.  Levi  represents  a  credal  state  by  a  set 
of  conditional  probability  functions,  Q(x,y).  For  every  £ 
consistent  with  background  knowledge,  the  set  of  functions  Q(x,y) 
is  convex.  Since  distinct  convex  sets  of  conditional  probability 
functions  give  rise  to  the  same  convex  sets  of  ebsolute 
probability  functions,  the  two  representations  are  not 
equivalent.  Smith  and  Kyburg  represent  a  credal  state  by  the 
convex  closure  of  all  probabilities  consistent  with  a  set  of 
probability  intervals.  Shafer,  as  will  be  seen,  implicitly 
offers  the  same  characterization.  Dempster  (1968)  offers  a  more 
restricted  characterization:  the  convex  set  representing  the 
credal  state  is  the  largest  that  both  satisfies  the  interval 
constraints,  and  can  be  obtained  from  a  space  of  “simple  joint 


3S 


propositions"  in  a  certain  way.  Levi  ha*  ahown  (1981,  pp.  338- 
392)  that  thaaa  additional  rastrlctlon*  are  inconpatibl*  with 
certain  natural  fora*  of  direct  inference  of  probahilitiea  fron 
known  atatiatlc*. 

2.  In  another  place  I  aball  argue  that  we  can  found  all  our 
probabilities  on  direct  or  indirect  atatistical  inference,  or  on 
set-theoretical  truths.  No  other  source  is  needed. 

3.  This  exaaple  was  suggested  in  conversation  by  Teddy 
Seidenfeld. 

4.  This  result  was  stated  inforaally  by  Levi  (1967),  and  is 
reflected  in  Diaconis  and  Zabell,  1982,  Theorea  2.1. 

5.  Dempster  (1967,  1968)  was  well  aware  that  his  rule  of 
coablnation  led  to  results  stronger  than  those  that  would  be 
given  by  a  acre  generalization  of  orthodox  Bayesian  Inference. 
His  reasons  for  preferring  the  rule  at  which  he  arrives  are 
essentially  philosophical:  in  an  orthodox  Bayesian  framework, 
unless  you  restrict  the  fanlly  of  priors,  you  don't  get  useful 
results  starting  with  zero  inforaation.  But  in  expert  systems, 
we  have  no  desire  or  need  to  start  with  zero  inforaation. 

6.  Quinlan's  (1982)  subtitle  suggests  the  opposite:  "A  cautious 
approach  to  uncertain  inference." 


7.  It  is  not  clear  that  Shafer's  belief  functions  were  intended 
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to  b«  UMd  la  a  deciaioo-tbeoretic  eoatazt.  Even  If  they  vara, 
thara  would  be  oarlous  diffieultiaa  standing  In  tba  way  of  such 
aaployaaot.  (Saa  Lav i  (1978,  1980,  1983),  and  Saidanfald 
(1978)).  For  praaant  purpoaas,  tbaaa  diffieultiaa  naad  not 
concarn  ua. 

8.  This  corresponds  to  Levi's  notion  (1981)  of  E -admissibility . 

9.  This  elegant  and  aiaple  example  was  proposed  by  Jerry 
Feldaan. 
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